Flat Topic List

This is an unsorted list of all possible topics, see Topics (Summer 2024) for a curated overview.

Application-Specific Conflict Resolution

Seminar

In decentralized systems, conflicts are commonplace. They can arise whenever one peer performs an operation that invalidates some concurrent operation by another peer. There exist a multitude of different conflict resolution strategies: For example, we can simply keep the conflict and let the users fix it (git), we can prevent the conflict from happening by enforcing a consensus between the peers, or we can define an application-specific strategy that resolves conflicts (e.g. adds win over removes).

Most distributed applications pick one conflict resolution strategy for the whole application, which is usually determined by the application's consistency level. However, such a global strategy is often too permissive (leading to concurrency bugs) or too restrictive (slowing down the application and preventing offline availability). Ideally, we would pick just the right strategy for each part of the application being permissive where we can and being strict where needed.

As part of this topic, you will analyse different conflict-resolution strategies and evaluate them in the context of local-first software.

Mergeable Replicated Data Types

Project

This is a reproduction project of an existing paper, where the goal is to reproduce and confirm or deny the results of the original paper.

MRDTs propose a replication strategy where existing data structures are converted to a universal set representation, which can be automatically replicated. The claims are that this is simpler and faster than the alternatives, but is this true?

On CRDTs in Byzantine Environments

Seminar, Project

In a Byzantine peer-to-peer system, other nodes are actively trying to attack the system. It has been noted by Martin Kleppmann, that replicated data types are already well positioned to deal with such attacks, by recording a history of all operations, and protecting the history with cryptographic checksums. This is the concept underlying the Matrix protocol – so the concept works – but does it generalize to other replicated data types?

Testing Local-First Software

Seminar, Project

Testing software becomes much harder in a local-first setting, because it is insufficient to test good/bad results of operations in isolation, but one must also consider concurrent execution of other operations. Moreover, for many local-first applications, it is not even clear what a “correct” result is. For example, if two users concurrently add three new item each on their shared shopping list, which of the possible resulting lists are correct?

A promising strategy is “property based testing”, where developers define abstract properties that should always be true (such as “the shopping list should contain each added item but no duplicates”), while the testing system automatically generates example executions and then checks the defined properties. However, for effective testing, the generated example executions should exercise many potential problems, which is still a mostly unsolved problem.

Verification of Local-First Software

Seminar, Project

Verification of local-first software (and distributed software in general) is challenging because it is insufficient to only reason about the local behaviour of a program. Instead, we have to account for operations that can happen concurrently on multiple devices. Existing approaches employ advanced type systems, deductive program verification, model checking or combinations thereof.

In this topic, you will explore the current state of verification approaches in the local first-domain (seminar) or implement your own/extend existing ones (project).

Version Conflicts and Schema Evolution in Decentralized Systems

Seminar, Project

Like every modern application, decentralized applications get updated from time to time. These updates sometimes introduce breaking changes, e.g., when they change/drop certain APIs or when they change the way messages in the systems are formatted.

This poses an interesting problem that is sometimes referred to as schema migration/evolution: How do we deal with the fact that different versions of the same application might co-exist in a running distributed system? Maybe some participants have already updated their app while others have not.

If we want to prevent errors due to version conflicts, we have to develop distributed applications that are ready to deal with data from the "past", but also with possible updates from the "future".

In this topic you will research and survey different strategies for dealing with schema evolution such as edit lenses (seminar) or you will try them out/implement your own (project).

Technique: Streams and Corecursion

Seminar, Project

Programs usually operate over data structures that have a finite size. For example, arrays have a definite end. On the other hand, codata refers to potentially infinite data structures. The simplest example are streams, which are sequences that can go on forever. Programming with codata introduces new challenges—for example, trying to calculate the sum of a stream may lead to a non-terminating computation—but it also enables interesting applications and techniques.

Exact Arithemtic With Digit Streams

Exact real arithmetic allows us to perform numerical computations to arbitrary precision. This contrasts with floating point numbers, which can produce drastically imprecise results in some cases. In functional programming courses you have probably encountered Peano arithmetic (with Zero and Succ) as one of the fundamental examples for pattern matching and functional programming. Similarily, a digit stream is for example a stream of digits like 0.9999999999.... and arithemtic on digit streams can be considered a fundamental example of working with streams. The idea behind exact real arithmetic is that while we cannot directly represent real numbers, as that would require infinite memory, we can produce streams of better and better approximations and stop when the result is precise enough for us.

Individiual challenges can be to implement

Interval Arithmetic & Midpoint Arithmetic
Rational Numbers and Dyadic Rational Numbers
Shrinking Intervals & Signed Digit Streams
Eudoxus Numbers
Algebraic Numbers
Dedekind vs Cauchy Numbers
Floats vs Posits.

Data Streams and Queries

Many real-world systems continually receive new data which they need to transform, analyze and aggregate. Frameworks like Apache Flink have adopted data streams as the underlying abstraction for implementing such systems.

Stream Compilers

Ideally, a stream computation framework allows programmers to use a high-level, declarative interface, while producing highly efficient code. Techniques from functional programming like higher-order abstract syntax, intrinsically typed embeddings, and smart constructors might be useful for implementing such a framework.

Application: Incremental Programming

Seminar

It it often a good idea to avoid redoing expensive computations. As a first approach, we can cache the result of a function call with a certain value, so that we can just return the cached results when the function is called with the same value again. However, when the input changes, then, no matter how small the change, it seems that we would need to perform the entire computation from scratch. The field of incremental computation studies how we can construct programs that reuse already computed results when the data changes. One application are databases: If you have defined an expensive query on your data, you don't want to redo the entire query once some new data is added.

Technique: Programs are Proofs

Seminar

"In 1934 Curry observed that the types of combinatory logic can be seen as axioms of intuitionistic logic. In 1969 Howard observed that natural deduction from proof theory forms a model for the lambda calculus. More informally, the return type of a function is analogous to a logical theorem, subject to hypotheses corresponding to the types of the argument; and that the program to compute that function is analogous to a proof of that theorem. This sets a form of logic programming on a rigorous foundation: proofs can be represented as programs, and especially as lambda terms, or proofs can be run." https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspondence

However, history didnt stop 1969. Up until this day exciting things can be explained and discovered by exploiting proofs that can run and give results, such as:

Intrinsic Proofs
Well-typed & Well-scoped interpreters
Parametricity translation
Logical Relations
Type-and-Effect System
Type-and-Coeffect System

Some recent papers we find interesting are:

Application: Automated Programming

Seminar, Project

The goal of automated programming is to accelerate software development by (semi-)automating the process of writing code and tests.

Property-Based testing

In property-based testing, a property such as "sorting a list does not change the length of a list" are stated that should be correct for a sorting function, and then the function is tested with randomly generated lists to see whether a counter example can be found that invalidates the property to make the test fail, otherwise the test is passed.

In a project, we could investigate and implement some procedure for automated generation of data following some techniques from a paper. For the seminar, a few recent papers on this topic would be the following -- depending on your interest we might find older or more recent other papers as well:

Program Synthesis

The goal of program synthesis is to turn a high-level specification of a problem into a program, ideally an efficient one. This usually involves searching through the (very large) space of possible programs to find the ones that satisfy our requirements. One approach is to turn the search problem into a set of logical formulas. These are then passed to an off-the-shelf SMT (Satisfiability-Modulo-Theories) solver, which (hopefully) produces a solution that satisfies the formulas. Searching through the space of programs can also be done through rewriting, where we replace part of a program with an equivalent expression. A data structure called E-Graph allows storing many similar terms in a compact fashion, making the search, also called equality saturation, more tractable.

Recently, efficient frameworks for equality saturation have been developed in Rust and Julia. These frameworks also support program analysis, where properties of the programs are inferred to aid with rewriting. Equality saturation has shown promise in making floating point computations more numerically stable and in optimizing linear algebra programs.

Application: Probabilistic Programming

Seminar, Project

Probabilistic programming is simply programming with probabilities. It can be used in the context of artificial intelligence or machine learning. Or more concretely, to perform random choices and then ask what the probability of certain results is: For example consider a program:

var x: Bool = flipCoin();
var y: Bool = flipCoin();

and then consider asking to infer the probability of x==y. (should be 50%)

In a project, we could investigate and develop some probabilistic language constructs following some paper. For the seminar, a few recent papers on this topic would be the following -- depending on your interest we might find older or more recent other papers as well:

Project ideas:

Implementing a simple DSL with a Probabilistic Generating Functions semantics in Lean
Implementing a simple DSL with a Straight Through Estimator semantics

Application: Choreographic Programming

Seminar

"Choreographic programming is an emerging paradigm for programming distributed systems. In choreographic programming, the programmer describes the behavior of the entire system as a single, unified program -- a choreography -- which is then compiled to individual programs that run on each node, via a compilation step called endpoint projection."

Technique: Programming with Effects

Seminar

Code often has a happy-path to the goal, but needs to consider a lot of error cases as well. These error cases deviate from the path and need to be handled, in software we can leverage several solutions differing in their power to similar problems:

Error codes. A simple solution is to return an error code in case of an error and handle that.
Exceptions. Exceptions enable a clean separation between the code that raises an exception and the code that handles it.
Coroutines. Besides raising an exception and therefore stopping code, alternatively, a coroutine can suspend execution with a question and other code can resume the coroutine with an answer.
Continuations. Further, it might be possible to create a copy of the current program execution state, resume execution and then later backtrack and continue execution with a different value.

At some point we no longer consider the exceptions as "side"-effects, e.g., things which went wrong, but instead we use the continuations as equally important parts of the primary goal of some functionality -- an effect. Effect handlers as generalizations of exception handlers offer interesting new ways to modularize software. Effects and co-effects can accurately represent additional capabilities and restrictions on parts of code.

In a project, we could learn how to use such an effect system and implement some case studies, or implement part of an effect compiler yourself following some papers. For the seminar, a few recent papers on this topic would be the following -- depending on your interest, we might find older or more recent other papers as well:

Application: Differentiable Programming

Seminar

Training a neural network means optimizing the parameters which control its behavior, with respect to a loss function. The usually employed optimization algorithms, which are based on gradient descent, require computing the loss function’s gradient. This means that we need to differentiate the neural network. While this could in principle be done by hand, automatic differentiation (AD) allows computing the derivative of a given program without additional programming effort. As AD is not restricted to the spe- cific operations used by typical neural networks, more complex constructs, for example involving control flow, can be employed in machine learning, as long as the program remains differentiable and has trainable parameters. This approach, which generalizes from deep neural networks to a broader class of programs, has been called differentiable programming.

Seminar

Modal logic revolves around two logical operators, whose meaning is "it is necessary that X" and "it is possible that X", respectively. The principles governing these operators turn out to be useful for reasoning about other concepts as well, like knowledge (in epistemic logic) or the progress of time (in linear temporal logic).

In computing, we can interpret logical operators as type constructors and their proof rules as syntactical constructs in a programming language (see also Technique: Programs are Proofs). This view on modalities has a number of applications, from showing the absence of side effects to tracking resource usage and modelling event-driven programming.

Engineering COPL-style language interpreters

Project

Our course, concepts of programming languages (not required for this topic, but certainly helpful), uses interpreters to study how specific foundational language features work.

However, a realistic language implementation requires much more than what we have discussed in our interpreters – not in terms of supported language concepts, but rather in terms of functionality expected of the implementation. These may include:

Parsing
Error reporting
Compilation to machine code
Runtime optimizations
Syntax highlighting
Code formatting
Refactorings

These days, one does not have to start from scratch when implementing language tooling. Solutions that will help might include:

If you choose this project topic, we will work with you to figure out an interesting slice of the problems/techniques above to work on. Feel free to already propose a focus area if you have something that interests you. None of the above lists are meant to be exhaustive – though we do ask for solutions we could potentially integrate with the actual COPL interpreters.

Flat Topic List

Application-Specific Conflict Resolution

Mergeable Replicated Data Types

On CRDTs in Byzantine Environments

Testing Local-First Software

Verification of Local-First Software

Version Conflicts and Schema Evolution in Decentralized Systems

Technique: Streams and Corecursion

Application: Incremental Programming

Technique: Programs are Proofs

Application: Automated Programming

Application: Probabilistic Programming

Application: Choreographic Programming

Technique: Programming with Effects

Application: Differentiable Programming

Technique: Programming with Modalities

Engineering COPL-style language interpreters